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Abstract 

The overarching objective of the proposed research was to develop an extension of 
the temporal context model to enable the description of a broad variety of phenom¬ 
ena across various subfields of cognitive psychology by integrating a representation 
of the time at which stimuli were experienced. This was accomplished. We pub¬ 
lished a paper in Neural Computation that describes such a model and applied it to 
problems in episodic memory, timing and classical conditioning. A more detailed 
version of this model, described in a paper revised for Psychological Review, ex¬ 
tends it to a wider range of phenomena by introducing a translation operator allow¬ 
ing for the construction of trajectories of predicted future states, and jumping-back- 
in-time to allow for an account of contiguity effects in episodic memory. In that 
paper we applied the model to problems from episodic memory, working memory, 
as well as second-order conditioning problems from trace conditioning. We met 
the near-term goals of generating simulation code for the model of timing, with 
three different approaches to simulating the equations implemented in the R pro¬ 
gram language. One is based on simulating the time series of inputs as a series of 
delta functions. The second approximates the representation of history by imple¬ 
menting a partial differential equation it obeys. The third, which is most general 
but subject to approximation errors explicitly calculates the operator that inverts 
the Laplace transform. The sensitivity to noise of the model was developed in the 
Neural Computation paper. We also made progress towards the longer-term goals 
of the proposal. Although not yet published, we have analyzed the properties of the 
model as applied to semantic memory. Although we have not conducted the simu¬ 
lations, we successfully secured funding from the NSF to pursue that longer-term 
goal. 


FA9550-10-1-0149 supported the development of a quantitative model of stimulus history 
and applied it to numerous phenomena in cognitive psychology. That model developed a mathemat¬ 
ical formalism in which a stimulus history f(x) is encoded by means of the Laplace transform, and 
an approximate inversion. 

The research proposed several sub-projects grouped roughly into near-term and medium-term 
projects. Here we su mm arize the final status of these sub-projects. After this brief summary, we 
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describe the scientific content of the output of the research project in considerably more detail. 
Near-term projects 

Near-term projects were anticipated to be completed in the first year of funding. Both of 
these were completed as anticipated. 

1. Development of a simulation library for remembered time. The model started with a 
mathematical formulation. In order to facilitate comparison of the equations to behavioral and 
neural results, we proposed to develop a software simulation library to implement the equations. 
As it turns out we developed three versions of the library, each corresponding to a different way of 
implementing the equations, in the R programming language. We have provided these libraries to 
other investigators on request to facilitate widespread use of the model in scientific investigation. 
We have also facilitated the development of a Python library based on our R code by the Dennis lab 
at Ohio State. 

2. Analytic study of recency and contiguity effects. The recency effect and the contiguity 
effect refer to two fundamental aspects of episodic memory. The recency effect refers to the finding 
that, all other things being equal, stimuli experienced more recently in the past are better remem¬ 
bered than stimuli experienced less recently in the past (Murdock, 1962; Glenberg et al., 1980). The 
contiguity effect refers to the finding that, all other things being equal, when one stimulus comes 
to mind, the next stimulus that comes to mind is likely to have been experienced close in time to 
the first stimulus (Kahana, 1996). Both of these phenomena have been observed over a variety 
of time scales (Glenberg et al., 1980; Howard & Kahana, 1999; Howard, Youker, & Venkatadass, 
2008; Unsworth, 2008) suggesting a common mechanism that is not based on traditional ideas of 
short-term memory. 

We proposed to analyze these findings in the context of our model of remembered time. The 
basic idea is that the current history serves as a cue to initiate recall, leading to the recency effect. 
Contiguity was hypothesized to happen when a stimulus is remembered causes recovery of the state 
of history when it was initially experienced. Because the representation of history is scale-free, it 
should account for both findings over a variety of time scales. 

The model’s treatment of the recency effect was first addressed in Shankar and Howard 
(2012). Recency worked as expected. The account of contiguity effect was slightly more involved 
and was presented in Howard, Shankar, Aue, and Criss (In preparation). Interestingly, the model 
ends up predicting violations of strict scale-invariance in the contiguity effect that appeal - consis¬ 
tent with experimentally-observed findings. Because of the centrality of contiguity in theories of 
episodic memory, we were pleased to contribute to two unanticipated side projects that extended 
this goal beyond our initial expectations. One was completion of revisions of a paper confirming 
neurophysiological predictions of the account of contiguity based on recovery of temporal infor¬ 
mation (Howard, Viskontas, Shankar, & Fried, In press). In that paper, the pattern of activity in 
ensembles of neurons in the human MTL showed evidence for a “jump back in time” as we had 
anticipated. In a second paper, we helped design and analyze an empirical study that developed a 
new experimental methodology for measuring the contiguity effect behaviorally in human subjects 
(Kilic, Criss, & Howard. In press). Because that study established a causal relationship between the 
presentation of the cue and the recovered memories, it cannot be accounted for by several competing 
explanations of the contiguity effect (Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann, & Usher, 
2005; Grossberg & Pearson, 2008; Farrell, 2012). 
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3. Sensitivity to noise. This subproject was designed to evaluate the model’s resistance to 
stochastic noise in its inputs. This is an important topic to investigate because if the model is too 
sensitive to disruptions, it would be unable to represent stimulus history over the long periods of 
time necessary to account for the behavioral phenomena under investigation. These analyses were 
conducted and reported in detail in Shankar and Howard (2012). Briefly, the representation of 
stimulus history is surprisingly robust to stochastic disruptions in the input patterns. We further 
examined the detailed response of the model to signals with various spectral signatures in Shankar 
and Howard (submitted). 

Medium-term projects 

Medium-term projects were ones that would not be likely to be completed within a single 
year. We were hopeful to complete these within the two year window. As it turned out, we success¬ 
fully completed one of these subprojects and made strong progress on the second. 

Temporal mapping in trace conditioning. Previous work in the mathematical modeling of 
memory focused either on human memory experiments in which subjects learn lists of words, or 
on conditioning experiments in which rats learn to respond appropriately to stimuli presented on 
various schedules. This parochial approach seems to us to be poorly suited to the goal of devel¬ 
oping a general theory of memory that applies across species. With this in mind, we proposed to 
describe phenomena from the temporal mapping literature (e.g.. Cole, Barnet, & Miller, 1995), a 
conditioning paradigm in which rats learn to integrate temporal relationships between stimuli (see 
below for more detail). 

We describe our treatment of temporal mapping in some detail below and in considerably 
more detail in Howard et al. (In preparation). Briefly, the solution turned out to require major 
advances in the mathematical framework. First, as anticipated, we required the same “jump back 
in time” that we used to account for the contiguity effect in episodic memory studies, suggesting a 
deep analogy between human list learning and animal classical conditioning studies. Second, we 
needed to construct a “translation operator” that enables the model to “play forward” anticipated 
future states of the world using the current state of history. This translation operator (described 
below) also, fortuitiously enabled an account of other behavioral phenomena, namely interval timing 
across scales and the “time-left procedure” from animal conditioning studies (see Howard et al.. In 
preparation, for details). 

Modeling the development of semantic representations in higher-order statistical environ¬ 
ments. Episodic memory refers to the ability to remember specific instances from one’s life. 
Episodic memories are rich in detail, as if the rememberer is reexperiencing the event. In contrast 
semantic memory reflects general verbalizable knowledge that does not depend on reexperiencing 
the specific learning event. For instance, the question “what did you have for breakfast” usually 
evokes an episodic memory in which one in some sense relives their most recent breakfast as paid 
of answering the question. In contrast, “what is the capital of Vermont” does not usually evoke a 
vivid recollection of the moment when that fact was learned. We extensively applied the mathe¬ 
matical framework to phenomena from episodic memory experiments. The goal of this subproject 
was to extend the formalism to account for semantic memory phenomena as well. Although not yet 
published, we have solved several problems that will be necessary for describing semantic memory. 

The basic idea is that meaning in semantic memory is developed largely by learning the 
temporal roles played by different stimuli. For instance, consider the novel word FLOOB. Prior to 
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experience, there is no information about the meaning of FLOOB. However, after a single exposure 
in the sentence “The baker reached into the oven and pulled out the FLOOB,” a tremendous amount 
of information about the meaning of FLOOB is available to the learner. The basic hypothesis is that 
we can use the current state of temporal history to generate a prediction. This prediction can be 
used to inform the meaning of stimuli that arc experienced (Shankar. Jagadisan, & Howard, 2009; 
Howard, Shankar, & Jagadisan, 2011). 

Previous work used this general approach to learn from a much simpler representation of 
temporal context. Our long-term goal is to utilize the richer representation of temporal history we 
developed to fulfill the same role. Previously we established that the version of the model based 
on temporal context can learn perfectly any language that can be described by bigram transition 
probabilities (Shankar et ah, 2009). That is all models in which the identity of the next symbol can 
be deduced from only the prior symbol. Unpublished analyses have shown that the expanded model 
should be able to account for lagged bigram languages. That is, languages in which the next symbol 
can be deduced from independently combining the identity of the previous symbol and the identity 
of the symbol before that and so on. This limitation is actually a huge advantage over trying to 
learn N-gram langauges which would require an astronomical number of observations to learn the 
probability of non-independent combinations of symbols. Moreover, the lags become less precise 
the further in the past one is considering. That is, the model distinguishes the symbol at lag 1 and the 
symbol at lag 2 with more temporal resolution than it does the symbol at lag 11 and the symbol at lag 
12. Subject to this constraint, the model can capture long-range correlations among symbols. We 
suspect that, because language includes constraints over a wide variety of time scales, this property 
constitutes an important advance over existing computational models of semantic memory (e.g., 
Landauer & Dumais, 1997; Griffiths, Steyvers, & Tenenbaum, 2007). 

As a practical implementation matter, we have determined tha the generalization from tem¬ 
poral context to temporal history is not a major obstacle because of the fact that history can be 
composed from states of temporal context and the lineality of the operator making that construc¬ 
tion. The major open problem to be solved in describing language is polysemy. In practice, lan¬ 
guage cannot simply be described as a lagged bigram language. The problem is that polysemous 
symbols—words with multiple meanings—play different roles in different contexts. This prevents 
the statistics from being expressed as a lagged bigram language: if what follows from the word 
FLY depends on whether it's in a discussion of baseball or a discussion of insects, this prevents us 
from treating the effect of that symbol independently of the other symbols nearby. The solution is 
to generate a new set of symbols in which polysemous meanings arc distinguished by their context. 
This amounts to a blind source separation problem, which is relatively well-understood. We have 
secured funding from NSF (BCS-1058937) to pursue this line of research and construct a large-scale 
computational model from naturally-occuring text. 

Detailed scientific description of results 

Taken as a whole, the set of behavioral findings accommodated in this mathematical frame¬ 
work reflect a major advance towards a satisfactory theory of memory. Here we describe the model 
and the results in more detail so that the interested reader can appreciate these deep connections 
between fields. By means of introduction, we first briefly recap the formalism of the model and the 
behavioral simulations. These arc described in more detail elsewhere (Shankar & Howard, 2012; 
Howard et ah. In preparation). 
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Figure 1. Schematic illustrating the construction of T from t and the role of M. a. In this schematic, 
the stimulus dimensions (columns) have been ordered with the sequence in which the stimuli were presented. 
Dark values correspond to higher levels of activation. Different rows correspond to different values of s. Each 
row of t represents the stimuli as an exponential function. The operator Lj. 1 takes t into T at each moment. 
Stimuli presented in the recent past have a concise representation; stimuli further in the past have overlapping 
representations. For visual clarity, values in T have been scaled to their maximum value, b. At each moment, 
the current value of f is used to update the current value of t. At each moment, the current value of t is used 
to update T, which is, in turn used to update M. The current value of M and the current state T are used to 
generate a prediction p. 


Representing the past 

Given a stimulus history f(x) leading up to the present moment, the goal is to represent that 

function using only operations local in time. This can be accomplished by using a set of leaky 

integrators t(s) that each obey the differential equation 

£ = -.«+f(T), (1) 

where x is physical time The solution to Eq. 1 is just the Laplace transform of f(Y < x): 

t(x) = | T dx > (2) 

This means that t(s) contains all of the information in the stimulus up to that point. Armed with 
this insight, we can construct an approximation of the inversion of the Laplace transform. It turns 
out that a method attributable to Post (1930) has just the properties we want. Referring to the 
approximation operator for integer k as Lj, 1 , we write 

T(x) = L k ‘ t(j) (3) 

= ^/ +1 t «(*), (4) 

where t®(s) is the kth derivative of t with respect to 5. The approximation approaches the inverse 
Laplace transform as k —> °°. T (x) approximates the stimulus history as a function of internal time x. 
Ligure la provides a schematic summary of the construction of T (x) from t(s) at any given moment. 

Predicting the future 

At each moment in time, we cannot only remember the past, but use history to predict the 
stimulus that will be presented at the next moment as well. This can be accomplished by comparing 
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the current state of T to the previous state of T in which the stimulus was experienced. Following 
TCM (Howard & Kahana, 2002; Howard, Fotedar, Datey, & Hasselmo, 2005; Sederberg, Howard, 
& Kahana, 2008), we store a simple outer product between the currently-experienced stimulus and 
the current state of T at each moment and store it in a tensor M: 

^ = |fW)<TM|. (5) 

The statistics of the environment arc stored in M (see also Figure lb). Each stimulus is encoded in 
the superposition of the states of temporal history in which it was encoded. Again following TCM, 
we can then use the current state of the temporal history as a probe to predict what stimulus will be 
presented at the next time step: 

|p(x))=M|T(x)>, (6) 

where p is referred to as the prediction vector. We can combine Eqs. 5 and 6 to observe that at each 
time step, each stimulus is predicted to the extent that it overlaps with its encoding context. 

It should be noted that the summation on the rhs of Eq. 6 is weighted by g(x) the number 
density of nodes representing a particular value of x. If N indexes the number of the cell, g(x) = 

dx 

More explicitly, 

|p) = |V.g(T)|T.), (7) 

where the subscript notation M_ refers to the matrix within the three-tensor M corresponding to a 
spectic value of x. Similarly, |T») refers to the vector within T corresponding to a particular value 
x. 


Predicting the future using backward replay. It would be extraordinarily useful if we were 
able to predict not only the immediately preceding moment, but also to generate a trajectory of 
future states leading from the present. There are many ways that one could conceivably implement a 
representation like T. One of the major advantages of the approach we’ve developed in the previous 
program period is that it provides a concise and physical mechanism for calculating such trajectories 
into the future. 

Note that we can write a discrete version of the differential equation encoding the Laplace 
transform (Eq. 1) 

t x+ i(s) = Rt x (s) + f x . (8) 

Here the operator R (in analogy with p in TCM) is just a diagonal matrix with p = e~ s on the row 
corresponding to each value of s in the sheet t ( 5 ) . 

In order to construct an estimate of what stimulus will be available 8 steps in the future, we 
need to estimate the state of T that will obtain at that time. That is, we can estimate p(x + 8) by 
operating on T(x + 8) with M. The difficulty is obtaining T(x + 8) at time x without waiting 8 
additional time steps. As it turns out, this can be accomplished by simply altering the weights in 
Lj, 1 . To see how this is possible note that 

T(x + 8, x) = L k ‘t(x + 8, s) (9) 

= Lj, 1 [r 5 t(x, s)] (10) 

= [l, 1 R 5 ] t(x, s) 


( 11 ) 
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Figure 2. Schematic illustrating important features necessary for predicting trajectories forward in time. a. 
Schematic shows t(s) for an idealized situation when the columns correspond to distinct stimuli in the order 
in which they were presented. Note the exponential gradient. The operator that approximates inversion of the 
Laplace transform, Ljj, 1 is multiplied by R 8 . As a result T is shifted to approximate where those components 
would be 8 steps in the future. Compare to Figure la. b. Characterization of the change to the weights of 
Lj. 1 . The center represents the weight along the diagonal. Flanking values give the weights of the adjacent 
positions within a row. Dark curve: Lj, 1 . Lighter curve: R 8 L k ' . When operated on by R 8 , the weights are 
shifted along the diagonal and reduced in magnitude (not to scale). 


That is, we can understand R 8 as operating on t(x,s) as in Eq. 10, or we can understand it as being 
absorbed into the weights of Lj. 1 as in Eq. 11. That is, because T is constructed from t(.v) using 
known weights, and because t(s) evolves deterministically, it is possible to infer the T that would 
obtain in the future and express it simply by transiently changing the weights of Ljj, 1 appropriately. 


A principled form for g(x). In the initial publications describing T, we kept g(x) in Eq. 7 as 
a general function. However, it turns out that there is a principled choice for g(x). Consider the 
distribution across x caused by a more recent stimulus and by a less recent stimulus. For the less 
recent stimulus, the function will be more broad about the peak. It does not make sense to devote as 
many cells to the broad peak as to the less broad peak. We did a calculation in which we determined 
the function g(x) that equalizes the information per cell. This turns out to be g(z) = 1. Interestingly, 

X 

this implies that the cells are logarithmically distributed. That is, internal, retrospective time obeys 
the Weber-Fechner law. Simultaneously, timing going forward (as in, say, a temporal bisection 
task), obeys the scalar property, with a linear relationship between predicted time and physical time. 
This is possible because forward-going timing is governed by the match of an entire T summed 
over x. Both the encoding T and the retrieval T are logarithmically distributed, so scalar timing 
is observed (see Howard et al., In preparation, for details). Subsequent work has shown that this 
choice for g(x) also equalizes the mutual information between adjacent cells within T(x) (Shankar 
& Howard, submitted). 

Behavioral applications 

The formalism described above provides a framework for remembering a history leading up 
to the present moment and generating a future trajectory. Here we summarize some highlights of 
the behavioral applications of the model we developed in the preceding period. 
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Figure 3. Hacker (1980) gave subjects a rapidly-presented list of letters followed by a choice between two 
probe letters. Subjects were instructed to choose the more recent probe. Reaction times for correct responses, 
responses where the subject chooses the more recent probe, depend very much on the recency of the more 
recent probe but are essentially independent of the recency of the less recent probe. In contrast, error RTs, 
in which the subject selects the less recent probe, depend very much on the recency of the less recent probe 
but are essentially unaffected by the recency of the more recent probe. Top: data. Bottom: Model. This is a 
one-parameter fit of the model. 


Episodic memory across scales. Having a representation of history enables a description 
of previously puzzling results. It also allows for an account of phenomena across scales that had 
previously been attributed to separate memory stores operating at different time scales. We have 
applied the model extensively to two episodic memory tasks, free recall and the judgment of re¬ 
cency task. In free recall, the model generates scale-free recency and contiguity (Howard et al., In 
preparation). While scale-free recency has been assumed by other models (e.g.. Brown, Neath, & 
Chater, 2007), we provided a principled process model that generates it as an emergent property of 
the equations given above. Scale-invariant recency at the behavioral level is a natural consequence 
of scale-invariance at the level of the representation. Scale-persistent contiguity is accomplished 
in our framework by assuming that there can be a recovery of the t(s) that obtained at a previous 
moment in time. This in turn enables recovery of the state of T that obtained at that time. Unlike 
previous attempts using TCM which have a scale fixed by the specific value of p used, this con¬ 
tiguity effect persists across arbitrarily large scales. It is not precisely scale-invariant because the 
asymmetry observed in the contiguity effect decreases as the scale is increased. This appears to be 
consistent with experimental data using final free recall (Unsworth, 2008). 

The major difference between T(x) and previous models of temporal context is the fact that 
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Yntema & Trask (1963) 



Recency of Item A 


Model 



Recency of Item A 


Figure 4. Yntema and Trask (1963) gave subjects a continuous relative judgment of recency task. The figure 
shows the probability of choosing a probe item as more recent as a function of its recency and the recency 
of the other probe item. Model predictions use the same scanning model as used in Figure 3. The model has 
one free parameter. 


T(x) retains temporal information about when the preceding stimuli were presented. In order to 
illustrate the advantage of this approach, we decided to model the judgment of recency (JoR) task. 
In the relative JoR task, subjects are given two probe stimuli and asked to choose the one that was 
presented more recently. With fast presentation of short lists, the JoR task generates results that have 
generally been taken as evidence for serial scanning. The data appear as if the subject examines the 
historical record in their memory moment-by-moment, stopping when they find either of the probes 
(Figure 3, Hacker, 1980; McElree & Dosher, 1993; Muter, 1979). We were able to account for 

these findings (Howard et al., In preparation) simply by assuming that the subject examines T(x) 

* 

at successive values of x starting at zero and extending backwards in time. At each moment, the 
subject detects a memory for a probe stimulus with probability proportional to the value of T in 
the appropriate stimulus column. When the subject detects a memory for one of the probes, she 
chooses it; if the scan goes on for long enough without detecting a memory, the subject guesses. 
This extremely simple model is able to account for the qualitative pattern of results in accuracy, 
mean correct RT and error RT (Figure 3). 

Most scholars of memory have treated the Hacker (1980) results as evidence for a dedicated 
short-term store that is contrasted with long-term memory. However, we showed that the simple 
model described above also accounts for several phenomena from what are usually thought of as 
long-term JoR tasks. Figure 4 shows that the same model provides an excellent qualitative descrip¬ 
tion of the experimental results of Yntema and Trask (1963), who performed a continuous relative 
JoR task with recencies chosen from a very broad temporal range. 

In addition, we provided a principled account of several other findings in what are usually 
thought of as long term JoR tasks, including the logarithmic increase in absolute JoR’s, in which 
subjects provide a numerical estimate for how many steps in the past a stimulus was presented 
(Hinrichs & Buschke, 1968; Hinrichs, 1970) and the approximate independence of separate judg¬ 
ments based on distinct presentations of an item (Hintzman, 2010). The former follows from the 
choice of g(x) and the latter follows from the linearity of Lj. 1 . We also showed that failures to 
observe substantial performance in the relative JoR task (Klein, Shiffrin, & Criss, 2007) can be 
accounted for by the choice of relative and absolute delays those authors used. 
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Figure 5. Procedure a typical temporal mapping experiment, a. In the first phase of training, one group 
(top) received CS1 with the shock US presented immediately after CS1 offset. The other group (bottom) 
received the US five seconds after offset of CS1. b. Both groups received second-order conditioning between 
CS1 and CS2. c. The presumed representation after both phases of training for each group if the experiences 
were aligned on CS1 onset. In the second group, CS2 predicts US onset more robustly than in the first group. 
After Cole, Barnet & Miller (1995). 


Temporal mapping. Ralph Miller and colleagues have done a series of elegant conditioning 
experiments that suggest two fundamental properties of learning that are particularly well-suited for 
the current mathematical framework. First, they argue that the temporal relations between stimuli 
forms an essential and unavoidable paid of the learning event. Second, they argue that learners can 
integrate disparate learning events into a coherent temporal map by aligning different time lines on 
a common stimulus. 

To make this more concrete, let us describe a specific experiment (Cole et ah, 1995). Rats 
were trained to associate a 5 s CS1 with a US (shock). In one condition, the time between offset of 
the CS1 and the onset of the US was 0 s. In the other condition, the time between the offset of CS1 
and the US was 5 s (Figure 5a). Let us refer to these as the 0 s and 5 s conditions, respectively. After 
training the CS1-US association, a second-order association was formed between CS1 and another 
5 s CS2. In both conditions, the onset of CS2 immediately followed the offset of CS1 (Figure 5b). 
In neither condition did CS2 ever cooccur with the US. The first finding was, not surprisingly, that 
the CR to the CS1 was stronger in the 0 s condition than in the 5 s condition. If associative strength 
is a scalar value, we would expect the second order conditioning to CS2 would also be stronger 
in the 0 s condition than in the 5 s condition. However, exactly the opposite was observed. This 
result makes no sense from the perspective of simple associative strength. Miller’s temporal coding 
hypothesis (Matzel, Held, & Miller, 1988; Savastano & Miller, 1998) reconiles these findings as 
follows. Note that if the two learning episodes were aligned on the CS 1 (as in Figure 5c), then the 
CS2 does not predict the onset of the US in the 0 s condition. In the 5 s condition, CS2 strongly 
predicts the onset of the US when the two learning episodes arc aligned. 

A model must have two basic properties in order to account for this phenomenon. One is 
that the temporal relationships between stimuli, rather than a simple scalar associative strength, 
is learned. Second, some mechanism for integrating disparate episodes into a coherent synthetic 
representation is necessary. The representation of temporal history offered by big T satisfies the 
first constraint. The ability to retrieve temporal contexts satisfies the second constraint (see also 
Howard et ah, 2005; Rao & Howard, 2008). We showed that the mathematical properties of the 
model enable it to construct satisfactory temporal maps between disparate experiences (Howard et 
ah. In preparation). 
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